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Abstract 

Q^ . In our recent work Il30ll33l we considered solving under-determined systems of linear equations with 

^ I sparse solutions. In a large dimensional and statistical context we proved results related to performance of 

a polynomial ^i -optimization technique when used for solving such systems. As one of the tools we used a 
probabilistic result of Gordon lITSl . In this paper we revisit this classic result in its core form and show how 
it can be reused to in a sense prove its own optimahty. 
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2 ! 1 Introduction 

O. 

O ■ We start by looking back at the problem that we considered in a series of recent work Il30]l32ll33l . It 

Tij- i essentially boils down to finding sparse solutions of under-determined systems of linear equations. In a 

^ I more precise mathematical language we would like to find a /c-sparse x such that 

~" Ax = y (1) 

^ I where j4 is an tti x n (?n, < n) matrix and y is an m x 1 vector (here and in the rest of the paper, under 

H ■ A;-sparse vector we assume a vector that has at most k nonzero components). Of course, the assumption will 

- - - be that such an x exists. 

To make writing in the rest of the paper easier, we will assume the so-called linear regime, i.e. we will 
assume that k = j3n and that the number of equations is ttt, = an where a and /3 are constants independent 
of n (more on the non-linear regime, i.e. on the regime when m is larger than linearly proportional to k can 
be found in e.g. [i9i il6iil7J ). 

A particularly successful technique for solving dD is a linear programming relaxation called ^i -optimization. 
(Variations of the standard ^i -optimization from e.g. ||7j|8]|27]) as well as those from II101I151[T9] - |2T1|261 re- 
lated to ^^-optimization, < g < 1 are possible as well.) Basic ^i -optimization algorithm finds x in ([U by 
solving the following ^i-norm minimization problem 

min ||x||i 
subject to Ax. = y. (2) 

Due to its popularity the literature on the use of the above algorithm is rapidly growing. We below restrict 
our attention to two, in our mind, the most influential works that relate to (|2]i. 



The first one is JSl whiere tiie autiiors were able to sliow tiiat if a and n are given, A is given and 
satisfies the restricted isometry property (RIP) (more on this property the interested reader can find in e.g. 
|[TJ[3l|5l|6l|24'l), then any unknown vector x with no more than k = f3n (where /3 is a constant dependent 
on a and explicitly calculated in [6]) non-zero elements can be recovered by solving (|2]). As expected, this 
assumes that y was in fact generated by that x and given to us. 

However, the RIP is only a sufficient condition for £i -optimization to produce the A;-sparse solution of 
([T|l. Instead of characterizing A through the RIP condition, in | IT.T?] Donoho looked at its geometric 
properties/potential. Namely, in H11II121 Donoho considered the polytope obtained by projecting the regular 
n-dimensional cross-polytope C" by A. He then established that the solution of ^ will be the A;-sparse 
solution of ([III if and only if ACp is centrally ^-neighborly (for the definitions of neighborliness, details 
of Donoho's approach, and related results the interested reader can consult now already classic references 
lEHIll)- In a nutshell, using the results of Il2ll4l l22ll23. 34l. it is shown in [12], that if ^ is a random mx n 
ortho-projector matrix then with overwhelming probability ACp is centrally A;-neighborly (as usual, under 
overwhelming probability we in this paper assume a probability that is no more than a number exponentially 
decaying in n away from 1). Miraculously, lfTn[T2l provided a precise characterization of m and k (in a 
large dimensional context) for which this happens. 

In a series of our own work (see, e.g. II32II331 ) we then created an alternative probabilistic approach which 
was capable of providing the precise characterization between m and k that guarantees success/failure of 
^ when used for finding the fc-sparse solution of ([T]). The approach was a combination of geometric and 
purely probabilistic ideas and used bunch of tools from classical probability theory, (most notably a couple 
of results of Gordon from lITSi that we will revisit in this paper). The following theorem summarizes the 
results we obtained in e.g. Il32l l 



Theorem 1. (Exact threshold) Let Abe anmx n matrix in ([7]) with i.i.d. standard normal components. Let 
the unknown x in (E]) be k-sparse. Further, let the location and signs of nonzero elements ofx be arbitrarily 
chosen but fixed. Let k,m,n be large and let a = — and (3^ = - be constants independent of m and n. 
Let erfinv be the inverse of the standard error function associated with zero-mean unit variance Gaussian 
random variable. Further, let all e 's below be arbitrarily small constants. 



L Let 6w, ({^w < ^u; < Ij be the solution of 



2 -(erfinv i^)f 



(1 - ef))(l - M^ V2erfinv{{l + e^^^)^-^) = 0. (3) 

If a and /3^ further satisfy 

J (erfinvi^))- 



then with overwhelming probability the solution o/(|2]) is the k-sparse yi from (|7]). 
2. Let 6w, (l^w < ^u; < Ij be the solution of 



(4) 



[2 -(er/zw(f^))2 
(1 + e(^))(l - M^ V2erfinv{{l - 6f))i-^) = 0. (5) 



2 



If on the other hand a and /3^ satisfy 
I 



1 
a < — 



(1 + 4' 



(m)x 



I r^ l^ Q \ r^ -{erfinv(}^^)) 



2^ 



V 



V2^ ^{erfinv{^)r e^(i + ^^9))-2 



(6) 



then with overwhelming probability there will be a k-sparse x (from a set of x 's with fixed locations 
and signs of nonzero components) that satisfies ([7]) and is not the solution of^. 

Proof. The first part was established in fSSl and the second one was established in ll30l . An alternative way 
of establishing the same set of results was also presented in 1.29.1 . D 

We below provide a more informal interpretation of what was established by the above theorem. Assume 
the setup of the above theorem. Let a^ and /3^ satisfy the following: 
Fundamental characterization of the (.i performance: 



(erfinv( \zf^ ))^ 
,(1-/3^) ^^^ .. ~^ - ^/2erfinv(^) = . | (7) 

Then: 

1. \i a > aw then with overwhelming probability the solution of (|2) is the fc-sparse x from (O. 

2. If a < ttw then with overwhelming probability there will be a fc-sparse x (from a set of x's with fixed 
locations and signs of nonzero components) that satisfies ([T]) and is not the solution of Q. 

As mentioned above, to establish the result given in ^ we used a couple of classic probabilistic results 
from jTSl. In the following section we will recall these results and see how they can be reconnected and in 
a way optimized. 

We organize the rest of the paper in the following way. In Section|2]we introduce and briefly discuss the 
two theorems from [ 18 1 that we plan to revisit in this paper, while in Section |3] we create the mechanism for 
optimizing the second of the theorems in certain scenarios. Finally, in Sections|4]and|5]we discuss obtained 
results. 

2 Key theorems 

In this section we introduce the above mentioned theorems that will be of key importance in our subsequent 
considerations. 

First we recall the following results from lITSl that relates to statistical properties of certain Gaussian 
processes. 

Theorem 2. ( l[18\l } Let Xij and Yij, 1 < i < n,l < j < m, be two centered Gaussian processes which 
satisfy the following inequalities for all choices of indices 

1. EiXf.) = E{Y^^) 

2. E{XijXik) = E{YijYik) 



3. E{XijXik) = E{YijYik),i^l. 
Then 



n[\]{Xij > Xij)) < P(.f]\J{Yij > A,,)). 



Based on the above theorem Gordon then went further and proved a more specific type of result now 
widely known as "Escape through a mesh" theorem. The result essentially looks at a particular class of 
Gaussian processes and connects them with the geometry of random subspaces and their intersections with 
given fixed subsets of high-dimensional unit spheres. 

Theorem 3. ( /IiS|/ Escape through a mesh) Let S be a subset of the unit Euclidean sphere S^~^ in i?". 
Let Y be a random (n — m)- dimensional subspace of R^, distributed uniformly in the Grassmanian with 
respect to the Haar measure. Let 

w;D(5) = Ssup(g^w) (8) 

wes 

where h is a random column vector in i?" with i.i.d. A/'(0, 1) components. Assume that wd{S) < 



)' 

P(ynS = 0) > l-3.5e 18 . (9) 



y^~4Z^-'"DiS)] 



Remark: Gordon's original constant 3.5 was substituted by 2.5 in [|25l . Both constants are not subject 
of our detailed considerations. However, we do mention in passing that to the best of our knowledge it is an 
open problem to determine the exact value of this constant as well as to improve and ultimately determine 
the exact value as well of somewhat high constant 18. 

In a more informal language, what Theorem |3] manages to create is a route to connect the location of 
a low-dimensional random subspace with respect to a given body and a seemingly simple quantity w{S). 
Then as long as one can get a handle of w{S) and dimensions are large enough one can get a pretty good 
feeling if a random flat will hit or miss the given body S. There are a couple of restrictions, though. What 
we call a body in an informal way is not really a body but rather a subset of the unit n-dimensional sphere 
and what we call a random flat is not really "just" a random flat but actually a subspace chosen uniformly 
randomly from the Grassmanian (one can think of it as a uniformly random choice among all subspaces of 
dimension n — m). We believe that it is easier to get a real feeling of the power of Gordon's results if one 
for a moment leaves technicalities out of the picture and instead views things in a more infoiTnal way. 

Along the same lines, in our opinion, to fully understand the miraculous importance of Theorem |3]it is 
maybe a good starting point to have a firm hold of understanding of the original geometric question that it 
answers. The question is incredibly simple: there is a set S which is a subset of sphere 5"~^ in i?". One 
then generates a uniformly random subspace (as we said above, in this paper, when we talk about uniformly 
random spaces/subspaces we of course view such a randomness in a Grassmanian sense) of dimension say 
n — m (where of course ?tt, > 0) and wonders how likely is that such a subspace will intersect with S. 
One simple example that could help visualizing these high-dimensional geometric concepts would be to 
take n = 3 and look at a spherical cap of the sphere S"^ in R^. Then one can chose say n — m = 1 and 
basically wonder how likely is that a random line through the origin would intersect such a spherical cap. 
Of course when 5 is a spherical cap the answer is simple and can be obtained through a simple geometric 
consideration as the ratio of the spherical cap's area and the area of the entire unit sphere. On the other 
hand, geometrically speaking, it is immediately clear how much harder the question becomes if S is not a 
spherical cap and n and n — maie large. 

If one then looks back at the original question, which, as we discussed above, is purely geometrical, 
it seems almost unbelievable (at least a priori) that it can be transformed to a purely analytical problem. 



The incredible contribution of Theorem |3] is exactly in its success to create such a transformation and to 
effectively connect this geometric question on one side and the properties of Gaussian processes on the 
other side. The idea of moving everything to the analysis terrain is great on its own, however what is more 
astonishing is that often one can actually accomplish it. Still, when one moves to the analysis terrain there 
are several questions one should be able to tackle (the problem may just seem a but easier when transferred 
to the analysis terrain, but nobody guarantees that it is actually easy!). The two questions that we found most 
pressing are: 1) Can one get a handle of wd{S) for any S7 2) Roughly speaking, the theorem only specifies 
what will happen if wniS) < \/m. Is there a definite answer as to what will happen if wd{S) > -^/m? 

When it comes to answering the first question it doesn't seem that the answer would be yes. Still, 
experience says that for many "practical" sets S one can actually handle wd{S) (see, e.g. II311I33II ). Even 
if computing the exact value of W£){S) may not be feasible there are possible alternatives. For example, 
one can try to bound wd{S) and in a way provide at least some kind of answer to the original geometric 
question. On the other hand, when it comes to the second question one could envision two possible scenarios. 
Assuming that the answer to question 1) is no, one then may start looking at particular sets S and then wonder 
which are the sets S so that wd{S) can be handled. Then the first scenario would be to look at those S for 
which wd {S) would not be computable. Then even if one can give a definite answer to question 2) the whole 
concept would appear as a raw theory without final analytical concreteness. The second scenario would on 
the other hand relate to those S for which wd{S) can be computed. This scenario is actually probably the 
first next direction for possible further studies of Theorem [3] In the following section we look at this very 
same scenario and observe that for certain S one can actually provide a definite answer to question 2. 

3 Revisiting Escape through a mesh theorem 

In the first part of this section we will look at a couple of technical details that relate to quantities from 
Theorem|3] We first revisit wd{S). As stated in Theorem|3] wd{S) is given as 

WD{S) = Esnpig^w). (10) 

To be a bit more specific we will assume that set S can be described through a functional equation, i.e we 
will say that 

S' = {w|||w||2 = l,/(w) <0}. (11) 

We will then accordingly replace w{S) by 

wdU) = E sup (g^w) 

subject to ||w||2 = 1 

/(w) < 0. (12) 

3.1 Deterministic view 

Clearly to gain a complete control over wnif) (and basically wd{S)) one ultimately has to consider its 
random origin. However, before going through the randomness of the problem and we will try to provide a 
more information about woif) (and a couple other deterministic quantities that will be introduced below) 
on a deterministic level. Along these lines, to distinguish between deterministic and random portions of 



nature of woif) (i.e. woif)) we will introduce quantity w{f, g) as 

■»^(/,g) = sup (g^w) 
subject to ||w||2 = 1 

/(w) < 0. (13) 

Then clearly 

WD{S)=WD{f) = Ew{f,g). (14) 

As mentioned above, for the time being we will focus on woif, g)- Also, to make the presentation easier we 
will assume that the sup in ([8]), (flOl ). (IT2l) . and (fT3l) can be replaced with a max (also for all other occasions 
in the paper where a sup may appear as more precise we will assume that scenarios are such that a max can 
replace it). Then (IT3]) can be rewritten as 

^(/ig) = max g w 

subject to ||w||2 = 1 



/(w) < 0. (15) 



Transforming (ITSb a bit further one gets 



^(/)g) = — min — g w 

subject to ||w||2 = 1 

/(w) < 0. (16) 

Using a Lagrangian multiplier one can move constraint on /(w) into the objective 

^(/)g) = ~ ™i^ max— g w + A/(w). (17) 

||w||2=l A>0 

One then easily has 

w{f,g,)<—Taax. min — g w + A/(w), (18) 

A>0 ||w||2=l 

and 

w{f,g)<uim max g w — A/(w). (19) 

A>0 ||w||2=l 

We will now leave the deterministic portion of the analysis of w{S) (or to be more precise the analysis of 
w{f, g)) for a moment and switch to consideration of a seemingly different optimization problem. Namely 
we will consider the following deterministic optimization problem 

r(/,^)=min /(w) 
subject to Aw = 

||w||2 = 1, (20) 

and through it we will introduce a new quantity t(/, A). This quantity will be in a way an "almost" coun- 
terpart to w^f, g). At this point the purpose of introducing such a quantity may not be clear. However, as 
we progress further it will become more apparent what its meaning is and why we introduced it. Here we 
only mention roughly that t(/, A) can be thought of as an indicator that subspace of w's. Aw, and the unit 
sphere ||w||2 = 1 have an intersection that is also contained in S. Namely, if t(/, A) < then indeed there 
is a w such that Aw = 0, ||w||2 = 1, and /(w) < 0. However by the definition of S from (fTTI) such a w is 



actually in S. On the other hand if r(/, A) > there is no w such that Aw = 0, ||w||2 = 1, and /(w) < 
and automatically the intersection of subspace Aw and the unit sphere ||w||2 = 1 is missing set S. 

Going back to (|20l ) and using again the Lagrangian multipliers one can then move the subspace constraint 
into the objective 

T{f,A)= mill max/(w) + z^ Aw. (21) 

||w||2 = l i^ 

Now, we will assume that the structure of set S is determined by a function /(w) for which it also holds 

r(/, A) = min m.axf{w)+v Aw 

||w||2 = l l^ 

= max min /(w) + v Aw. (22) 

•^ ||w||2 = l 

In fact, as it will be clear from the subsequent analysis, the property that we will mostly utilize is actually 
the sign of r(/, A). Having that in mind one can actually relax a bit requirement (l22l) 

sign(r(/, A)) = sign( min max f (w) + i/'^ Aw) 

l|w||2 = l '^ 

= sign(max min /(w) + ly Aw). (23) 

" l|w||2 = l 

Clearly, (|22] ) or (1231 ) will not hold for any f{w) and any A. However, we will assume that there are f{w) 
and A for which they will hold. After rearranging (l2Tt a bit we have 

— r(/, A) = min max — /(w) — i/ Aw, (24) 

I^ ||w||2 = l 

and after rearranging (|23T i a bit we have 

— sign(r(/, ^4)) = signfmin max — /(w) — i^ Aw). (25) 

" l|w||2 = l 

At this point one should note that while quantities w{f, g) and r(/, A) are random, so far they have been 
treated as deterministic. In other words, we viewed them as functions of a fixed pair (g, A). Moreover, 
they are in a good enough shape that we can switch to a probabilistic portion of their analysis. Probabilistic 
portion of the analysis will essentially contain an analysis that will determine typical behavior of these two 
quantities when components of g and A are i.i.d. standard normals. 

3.2 Probabilistic view 

To obtain a probabilistic view on quantities w{f, g) and r(/, A) we will invoke the results of Theorem |2] 
We will do so through the following lemma which is slightly modified Lemma 3.1 from fTSl (Lemma 3.1 is 
a direct consequence of Theorem|2]and the backbone of the escape through a mesh theorem). 

Lemma 1. Let A be an m x n matrix with i.i.d. standard normal components. Let g and h be n x 1 
and m X 1 vectors, respectively, with i.i.d. standard normal components. Also, let g be a standard normal 
random variable. Then 

P( min max (— z^^^Aw+llzvlUg — Cw ;/) > 0) > P( min max (llj^lbg'^w+h^i/— Cw ;/) > 0). (26) 

ve-R"\o||w||2=i ' i/eR"\o||w||2=i 



Proof. The proof is exactly the same as is the one of Lemma 3.1 in [181 . D 



Let Cw,i/ = £5 v^ll'^lb + /(w) with e^ > being an arbitrarily small constant independent of n. We 
will first look at the right-hand side of the inequality in (|26] ). The following is then the probability of interest 



P( mill max (||i/||2g'^w + h^i/-e^^^Vn||i^||2-/(w)) > 0). (27) 

ueR'^\0 j|w||2=l 

After pulling out ||z^||2 one has 

P( min max (\M\2g^w+h^ i^-ei^^ y/Elluh-fM) > 0) = P( min max (||i/||2(g'^w+-— J^-el^^\/n-4rV^)) > 0) 
i/ei?"\0|jw||2=l i^e-R"\0|lw|J2=l ||l^||2 \W\\2 

and then easily 

P( min max (lli^lbg w+h J^— e^ \/n|kl|2 — /(w)) > 0) = P( min max (g w+-r — ef \^—-r — 77—) > 0). 

i^eiJ"\o|jw|J2=i i^e-R"\oi|w||2=i \\v\\2 Iklb 

Replacing \\iy\\2 with a scaler j- and solving the minimization over different u with a fixed ||z^||2 one obtains 

P{ min max (lli^lhg w+h z^— e^ -v/nlli/lb — /(w)) > 0) = P(min max (g w— Ai^/(w)) > ||h||2+e^^ -v/n). 

l/ei?"\0||w||2 = l A^>0||w||2 = l 

(28) 
Since h is a vector of m i.i.d. standard normal variables it is rather trivial that P(||h||2 < (1 + e^^ )^/m) > 
1 — e~'^2 *" where e^ > is an arbitrarily small constant and €2 is a constant dependent on e\ but 



independent of n. Then from (I28l l one obtains 

P(min max (g w — \yf{w)) > ||h||2 + £5 v^) 
A^>0 ||w||2=l 



> (1 - e-^2'"'™)P(min max (g^w - A^/(w)) > (1 + e^l^^)^M + e'f^V^)- (29) 

A^>0 ||w||2=l 



We now look at the left-hand side of the inequality in (126]) . 

P( min max (—u ^w+lli/lhq— Cw i/) > 0) = P( min max (—u A\v—f(-w)-\-\\i'\\2(q—ei\/n))>0). 
\/ei?"\o||w||2=i II II ^ -^ , y VeR"\o||w||2=i J \ ^ u u \^ V /y 

(30) 

Since P{g < e^ \/n) < 1 — e~'^6 " (where eg is, as all other e's in this paper are, independent of n) from 

(|30l ) we have 

P( min max (-J^^Aw+||z/||2fif-Cw ^z) > 0) < (l-e'^e" "')P( min max (-z^'^Aw-/(w)) > 0)+e~^e'' ". 
i/ei?"\o||w||2=i ' i/e-R"\o||w||2=i 

(31) 

Connecting ^, ^, ^, ^, ^, and dlB we obtain 

P( min max (—u Aw — f (w)) > 0) > 

VGiJ"\0||w|l2=l ^ ^^ ^~ 

(»n) . (9) 

^ ,. ^ P(min max (g^w - A,/(w)) > (1 + 4^))^^ + 4^)v^) + -^ (32) 

(1 _ e-4 ") ^■'^o l|w||2=i (1 _ e-4 ") 

Let 

?(/,g) = min max (g'^w - X„f{w)) (33) 

A^>0 j|w||2=l 



and 

^D{f) = E({f,g). (34) 

Using dmi and ^, ^ becomes 

(m) . is) 

P{-r{f,A) > 0) > ^ ~" '(,) ^ Pm,s) > {l + e^r^)V^ + ^5^V^) + " \,) • (35) 
(1 - e-4 ") (1 - e-4 ") 

Now we will make assumption that ^(/, g) concentrates around Cd(/) and that Coif) ~ V^' i-^- 

Pmf,s)-^D{f)\ > ePeD(/)) < e-4'^«-(/). (36) 



(This assumption can be avoided; however in the interest of maintaining as simple a presentation as possible 
we will state it). Moreover, let 

(l-6j^))eD(/) > {l + e^r^)V^ + ei'^V^. (37) 

Then from (1351) we have 

P(-r(/,^) > 0) > ^' ,,, ^ 1 - e-4 '«-(/)) + ^ ^. (38) 

(1 _ e-4''") (1 - e-'e^) 

Finally, if all assumptions we made indeed hold then 

{™.) . (s) 

lim P(r(/, A) < 0) > lim ^ ^(1 - e'^^ «-(^)) + ^^ = 1. (39) 

In other words, if (|22] | (or (l23Tl). (|36] |. and (|37| | hold then for large n one has with overwhelming probabihty 
that the random subspace of w's. Aw, will intersect set S on the unit sphere. 

We are now in position to state the following theorem which in a way complements Theorem |3] 

Theorem 4. (Trapped in a mesh) Let m and n be large and m < n but proportional to n. Let S be a subset 
of the unit Euclidean sphere S"~^ in ET". Moreover, let S be such that it can be characterized through a 
function /(w) in the following way 

S = {w|||w||2 = l,/(w) <0}. (40) 

Let Y be a random [n — m)-dimensional subspace of R^, distributed uniformly in the Grassmanian with 
respect to the Haar measure. For example, let 

Y = {w\Aw = 0}, (41) 

where A is an m x n matrix ofi.i.d. standard normals. Let gbe an m x 1 vector of Ltd standard normals. 
Further let 

ioU) = E mill max (g^w - A,/(w)). (42) 

A^>0 ||w||2 = l 

Assume that f{'w) is such that d22l) (or d2JD ) and f lJ6D hold. 

1) Let ei and €2 be arbitrarily small constants and let m be such that 

CD{f)>{l + €i)V^ + e2V^. (43) 



Then 

lim P{Y n 5 ^ 0) = 1. (44) 

n— >oo 

2) On the other hand, let m be such that 

Cd(/)<^/^-7^. (45) 

Then 

lim P{Y n 5 = 0) = 1. (46) 

Proof. The first part follows from the discussion presented above. For the second part we first observe from 
([Iland(l33]l 

w{f,g) < mill max (g^w - Aj,/(w)) = C(/,g). (47) 

A^>0 |jw||2=l 

Then we have 

wd{S) = wdU) = Ew{f,g) < E mill max (g^w - A,/(w)) = i?C(/,g) = ioif). (48) 

A^>0 j|w||2=l 



A combination of (1481) and the condition given in (1451 ) gives 



wd{S) = wdU) <iD{f)<V^- -r^. (49) 

4vm 



which is then enough to apply Theorem |3] and obtain (l46l ). D 

In essence the above theorem provides a characterization of sets S for which one can determine in 
a sense an optimal maximal/minimal dimension of the missing/intersecting subspace ^w. Of course the 
result of the previous theorem will be useful as long as one is able to handle (compute) i^£). Also, one should 
note that there are numerous other ways that can be used to present the main results obtained above. We 
chose the way given in the above theorem in order to be as close as possible to the original formulation 
given in Theorem |3] and at the same time to maintain a presentation that would in a way hint what the main 
ideas behind the entire mechanism are. For example, among many alternative formulations, the following 
two are probably even more natural than the version presented in the above theorem. First, instead of trying 
to formulate results along the lines of Theorem |3]one can formulate probabilistic results based on (l38T l and 
the corresponding ones that can be obtained in analogous way for w{f,g). We skip this exercise but do 
mention that in the absence of Theorem |3] such a presentation would be our preferable one. Second, instead 
of relying on quantity ^d one can rely on the original wd{S). Since this modification is relatively simple 
we will provide a brief sketch of it below. We also do mention that this modification will in the end produce 
results that are visually more similar to the ones given in the original formulation in Theorem |3] However, 
to achieve a mere similarity one is in a way forced to remodel formulations given in Theorem 5] which in 
our view contain a bit of a flavor as to how the entire mechanism works. That way one ultimately produces 
a visual analogue to Theorem |3] but at the expense of losing a bit of the hint as to what the core of the 
presented concept is. Still, we do believe that it is convenient to have such a formulation handy and we 
therefore present it below. 
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3.3 An alternative formulation 

The inequality given in ( |29l ) can be further extended in the following way 

P(min max (g w — A,y/(w)) > ||h||2 + £5 v^) 

A;^>0 |JW||2 = 1 

> (1 - e-^2'"'™)p(min max (g^w - A^/(w)) > (1 + e^^^)V^ + e'i^ V^) 

Ay>0 ||w||2 = l 

> (1 - e-^2'"'™)P( max min(g^w - A^/(w)) > {1 + e^(^^)^M + ei^'^V^) 

= (1 - e-^2'"'™)P(- mill max(-g^w + A^/(w)) > (1 + e^^^)^^ + efV^). (50) 

||w||2=l A^>0 



Using (IT7] ) one then has 



A9) 



P(min max (g w — A^/(w)) > ||h||2 + e^ \/n) 
Ai/>0 ||w||2=l 



^e-'-" — ' '" ^ - - l^)^.fz::^l9). 



>(l_e-^2 -)p(^(/,g)>(i + eP)^ + eW^). (51) 
Connecting ^^, ([27]), (|28]), (|29l), dSOll, (E), and (ISB we arrive at the following analogue to 



P{ mill max {—v Aw — /('w)) > 0) > 

j/e-R"\o j|w||2=i 



-e^F'n 



^ (,) ^ P{Hf, g) > (1 + e^r^)^ + ^"i^M + — T^. (52) 

(1 - e-^6 ") (1 - e-4''") 

which after using (|24l) becomes the following analogue to (l35l) 



(■m) , (s) 



P{-r{f,A) > 0) > ^ ^p(^(/,g) > (1 + eM)^ + 45)^) + (53) 

(1 - e-4''") (1 - e-'e"^) 

As in the previous subsection, we will make assumption that w{f, g) concentrates around W£){S) = woif) = 
Ew{f, g) (which is a bit easier to insure than the concentration of ^(/, g); a way for doing so can be deduced 
from [1I81I ) and that wd{S) = woif) ~ V^, i-e. 



(m) 



P{\w{f,g)-WD{f)\ > er^Dif)) < e-^2 -°(^). (54) 

(The assumption can also be avoided; as mentioned above, one way to do so even for a fairly general / 
is to follow the presentation of ifTSl ; however as was the case in the previous subsection, in the interest of 
maintaining as simple a presentation as possible we will simply assume (l54l)). Moreover, let 

(1 - e'r^)wn{f) > (1 + e'r'>)V^ + e?^ V^. (55) 

Then from ( |53] ) we have 

("i) , (9) 

P(-r(/,A)>0)>^ ^{1-e-^- --(/)) + ^^. (56) 

(1 _ e-4''") (1 - e-4''") 
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Finally, if all assumptions we made indeed hold then 



,('")^N .(9). 



rn-oo n-s>oo ^ _ g_e^9^„N /-j^ 



lim P(r(/, .4) < 0) > lim i^^-^-^(l - e-'2~'-^D{f))) + ^ \^, = 1. (57) 



In other words, if (|22] | (or (l23Tl). (|54] |. and (ISST i hold then for large n one has with overwhelming probability 
that the random subspace of w's. Aw, will intersect set S on the unit sphere. 

We are now in position to state the following theorem which is an alternative formulation of Theorem |4] 
and as Theorem |4]in a way complements Theorem |3] 

Theorem 5. (Trapped in a mesh — alternative) Let m and n be large and m < n but proportional to n. Let 
S be a subset of the unit Euclidean sphere S^~^ in K^. Moreover, let S be such that it can be characterized 
through a function /(w) in the following way 

S' = {w|||w||2 = l,/(w) < 0}. (58) 

Let Y be a random (n — m)-dimensional subspace of R", distributed uniformly in the Grassmanian with 
respect to the Haar measure. For example, let 

Y = {w|Aw = 0}, (59) 

where A is an m x n matrix ofi.i.d. standard normals. Let gbe an m x 1 vector ofi.i.d standard normals. 
Further let 

wd(S) = wdH) = E max g w = E'niaxg w. (60) 

||w||2=i,/(w)<o wes 

Assume that /(w) is such that A22\l (or (i23h and l \54i hold. 

1} Let ei and e2 be arbitrarily small constants and let m be such that 

WD{S) = WD{f)>{l + ei)V^ + e2V^. (61) 

Then 

lim P{Y n 5 / 0) = 1. (62) 

n— ^-oo 

2) On the other hand, let m be such that 

wd{S) = wnif) < Vrn - --^=- (63) 

Then 

lim P(y n 5 = 0) = 1. (64) 

n— ^-oo 

Proof. The first part follows from the discussion presented above. The second part follows from Theorem 
l3]and parts of its proof given in lITSl . D 

Visually speaking. Theorem |5] may seem as a more natural complement to Theorem |3] It is probably 
even a bit simpler than the formulation given in Theorem 5] On the other hand, formulation in Theorem |4] 
is still our preferable one. In a way, it contains a bit of a description of what really is the key to success of 
the entire mechanism. If one is to give only the second portion of these theorems we do believe that then 
Theorem|5]is a more suitable choice (of course, by no surprise that is exactly what was done in lITSl ). 
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3.4 Comments 

As far as understanding of the above theorems goes, there are several comments that we beUeve are in place. 
Below are some of them. 

1. As one compares the statements of Theorems |4] and |5] on one side and the statement of Theorem [3] 
on the other it is clear that the concentration results are stated differently. In fact, not only are they 
stated differently they are also way inferior in Theorems 5]and|5] We did mention right after Theorem 
[3] that determining concentrating constants is to the best of our knowledge an open problem even in 
the original formulation given in Theorem |3] The same remains true for both of our theorems. The 
difference though is that while constants in Theorem |3]are most likely not the best possible ones, they 
are, when compared to generic e's (given in our theorems), much better. We do mention that in this 
paper our major concern was a general type of result that relates to relation between wd{S) (^d) and 
m rather than a precise concentration analysis. Still, it would be of great importance if one could 
provide a way more precise analysis and determine ultimate optimality of concentrating constants 
as well. Our e's can relatively easily be translated into concrete numbers. However, determining 
their optimal values is actually what requires a more careful approach. In fact, quite possibly, one 
may end up obtaining the optimal constants which are very large (simply, because one would have to 
encompass the entire family of sets S; such is the standard set by the generality of some of results 
presented in Theorems |3l IH and |5l). This is partially the reason why we haven't stated any specific 
constants but rather left such a problem to be solved on individual case basis. 

2. Another important question that may arise based on our presentation is which of many alternative for- 
mulations would be the best possible. Answering such a question seems rather hard. Our experience 
is that when the mechanism works then typically everything (every quantity of interest) concentrates 
and if one is then fine with ignoring specifics of concentrations then essentially all formulations are 
fine. 

3. The results presented above will not hold for all sets S. The question then remains can one determine 
the class of sets S for which they will hold (such a subclass is determined by the two above theorems). 

4. How hard is for a function to actually satisfy the assumptions that we have made? This is again a 
very generic question and it seems that it is better to form a class of functions for which they do hold, 
instead of trying to exclude those for which they do not. 

5. How limiting/general are our descriptions of set S7 In reality the description of set 5* that we assumed 
is rather simple. We basically assumed that the entire set can be characterized through a functional 
inequality. However, our assumption was made mostly for the exposition purposes. The entire mech- 
anism would go through as well even if set S was characterized by an arbitrary number, say L, of 
functional inequalities, i.e., f^^'i'w) < 0,1,2 ... ,L. 

4 Discussion - how all of it actually works 

While the results presented in the previous section may seem a bit dry they are actually quite powerful. 
However, to really get a feeling how powerful they are one would have to convince himself/herself that 
there are scenarios when they can be used. While conceptually we discovered an array of sets S for which 
subspace dimension results of Theorem |3]eventually through Theorems |4] and [5] become optimal we believe 
that it is easier to grasp the concept on small examples. Of course that is the reason why in the first part of 
the paper we briefly presented a problem that we were able to attack to full optimality using the mechanisms 
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formulated in Theorems |4]and|5] Below we will briefly sketch how the results presented in Section [Jactually 
fit into the context of the machinery presented in the previous section. Before doing so we just provide a 
small example that shows how the entire machinery can be modified a bit if function /(w) is of a special 
type. 

4.1 Homogeneous /(w) 

When function /(w) is homogeneous one can actually change a bit the presentation described above. In fact 
the presentation can be changed in many other scenarios as well; however we selected this one just to give 
a flavor as to what are possible options. Another reason is that sketching how the results given in Section [T] 
fit into what was presented above will be a bit easier. Now, let /(w) be a homogeneous function. Namely, 
let /(w) be such that 

/(aw) = a'^/(w), (65) 

for any a > and a d > 0. Then we say that function /(w) is positive homogeneous of degree d. Then for 
all practical purposes one can redefine r(/, A) from (l20l ) in the following way 

rW(/,A)=min /(w) 

subject to Aw = 

||w||2 < 1. (66) 

Proceeding then as in Section |3?T] one can write 

rW(/,A)= min max/(w) + i/^Aw, (67) 

||w||2<l " 

and assume that the structure of set S is determined by a function /(w) for which it also holds 

T^^\f,A) = min max f (w) + v'^ Aw 

l|w||2<l i" 

= max mill /(w) + v Aw. (68) 

" l|w||2<l 

If (as in Section [3?T] ) one instead focuses only on the sign of t^'^' (/, ^4) one can relax a bit requirement (l68T l 
to 

sign(r^'^)(/, A)) = sign( min max/(w) + z/'^Aw) 

l|w||2<l I' 

= sign(max min /(w) + z^ Aw). (69) 

1^ ||w||2<l 

After rearranging (l68l a bit we have 

- T^'^Uf, A) = min max -/(w) - v'^Aw, (70) 

" l|w||2<l 

and after rearranging (|69] l a bit we have 

— sign(r' '(f,A))=sign(-mm max — /(w) — z^ Aw). (71) 

•^ l|w||2<l 

Now one can repeat all the derivations from Section ll!2] with t^^^ (/, A) instead of r(/, A). As a final result 
one would wind up with the theorems that are exactly the same as Theorems |4] and |5] The only difference is 
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that the assumptions on /(w) would be those from (1681 ) (or (|69] )) instead of those from (|22] | (or (|23])). This 
is a bit convenient since it essentially boils down to a duality over a convex set. Of course, everything we 
mentioned in this subsection remains true for any function for which the sign of r(/, A) from ( |20l ) does not 
change if one relaxes the sphere condition to the ball condition. 

4.2 An example of set S where everything works 

In this subsection we sketch how the results presented in Section [T] fit into the framework given in Section 
13.21 We recall first that the problem that we were interested in in Section [T]is essentially the following: for 
a given n-dimensional fc-sparse vector x (with say last n — k components being zero) can one estimate the 
dimension of matrices yl in ([T]! such that the solution of Q is actually fc-sparse. In fact let us be a bit more 
specific. Let us look at a A;-sparse vector x (given the statistical structure that will be later on assumed on 
A, one can without a loss of generality, set Xj = 0, i = A; + 1, A; + 2, . . . , n). Now, the question of interest 
is: given A and j4x (where ^4 is an ?n, x n matrix and is typically called the measurement matrix) can one 
find X such that 

Ax = ylx. (72) 

To make sure that we maintain consistency we do emphasize that Ax in (172] ) is what y in Section [T] is (in 
other words, although we did not state it anywhere in Section [H y was essentially implied to be constructed 
as the product of matrix A and a A;-sparse vector x). As we have mentioned in Section [T] a popular way to 
attack the above problem is to solve ^, i.e. the following optimization problem 

min ||x||i 

X 

subject to Ax = Ax. (73) 

While the original problem d72] i is NP-hard in the worst case, the optimization problem in (1731 ) is clearly 
solvable in polynomial time. Let x be the solution of (1731 ). The question then is how often (if ever) x = x. 
The line of thought first goes through the recognition that x will be /c-sparse x only if there is no w such 
that X = X + w, where w is in the null space of A and satisfies (see, e.g. ll30l l 



n 



^Wi> ^ ||w,||. (74) 



i=l i=k+l 



If one then defines set S on the unit sphere S^"' ^^ based on this parametrization of non-favorable w's one 
effectively obtains 



k 



If one then defines /(w) as 



then clearly we have 



'5 = {w|^Wi+ Y^ ||w,|| <0, ||w||2 = 1}. (75) 

4=1 i=k+l 

k n 

/(w) = J^w,+ Y, ll^^ll' (76) 

1=1 i=k+l 

5 = {w|/(w)<0,||w||2 = l}, (77) 

which fits into the description of S given in (fTTI) . Moreover, S and ultimately / will indeed satisfy all 
assumptions that we have made. Namely, /(w) from d76] l is positive homogeneous of degree 1 and duality 
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in (|68] ) and ( [69l ) will easily hold. Also, let (as in Theorems |4] and [5]) 

Y = {w|w G R", Aw = 0}. (78) 

Now, if one look at all w's from the null-space of A, i.e. at set Y, one can then connect the intersection of 
sets Y and S with x being equal or not to x. Namely, i{YnS = then x = x and if y n S" / then 
there will be an x such that Xj = 0,i = k + l,k + 2, . . . ,n and x 7^ x. Now, if one views the problem 
in a random context with matrix A being an m x n matrix of i.i.d. standard normals, then one can for a 
given ratio | determine the critical value of ratio ^, ^ = "'^^'^^ , so that for ^ > ^ = "'^^'^^ with 
overwhelming probability x = x for all x such that Xj = 0,i = /c + l,A; + 2,...,n. On the other hand, for 



\2 



m ^ m^ _ wd( ) ^-j-j^ overwhelming probability there is an x such that iti = 0,i = k + l,k + 2, . . . ,n 
and X / X (in fact to be more in alignment with our theorems, instead of with overwhelming probability we 
should say with a probability that goes to one as n — )• 00). 

Of course, what we presented above is just how critical m^ can be connected to wd{S). In a way that 
solves only a half of the problem. The second half is to actually determine wd{S). That relates to question 
1) that we mentioned in the short discussion after Theorem |3] On the other hand, our main concern in here 
is question 2) from the very same discussion and along the same lines details related to handling wd{S) go 
beyond the scope of this paper. However, we do mention in passing that computing wd{S) was one of the 
problems of interest in Ii30ii33il and the results obtained there are actually those presented in Theorem [T] 

Also, what we presented in this section is a simple way how one can interpret the entire mechanism from 
previous sections when it comes to a particular set S. The interpretation given above is related to a rather 
simple set S. A more complicated version of S where everything also works can be found in e.g. Il28ll3ri . 

5 Conclusion 

In this paper we revisited a couple of classic probability results from lITSll . These results relate to the 
geometry of the intersection of random subspaces and subsets of the unit sphere in i?" and properties of 
Gaussian processes. Namely, in [18], the likelihood of having random subspace of i?" of dimension n — m 
intersect a given set S on the unit sphere was connected to a quantity describing set S called the Gaussian 
width. Moreover, it was shown that m can go (roughly speaking) as low as the squared gaussian width 
without having any significant likelihood of the random n — m-dimensional subspace intersecting set S. In 
this paper we provided a characterization of a class of sets S for which if m goes lower that the squared 
gaussian width of S then it is highly likely that n — m-dimensional will intersect set S. In a way we provided 
a partial complement to the results of |[T8l . 

Also, to give a bit more flavor to a rather dry presentation of high dimensional geometry we gave a fairly 
detailed presentation of how the results that we created can in fact be utilized. We chose an example that 
deals with solving under-determined systems of linear equations with sparse solutions. It turns out that when 
the systems are random and gaussian the success of a technique called £1 -optimization when used to solve 
them can be connected to the problem of random subspaces intersecting given set S on the unit sphere. We 
described how such a connection can be established and then provided a sketch as to how the main results 
of this paper actually work when such a connection is established. 

While we presented only one specific example to give a flavor how everything practically works, the 
overall methodology is way more powerful. There are various other instances where we were able to suc- 
cessfully employ majority of the ideas presented here. Moreover, the mechanisms presented here are in fact 
a subcase of a much larger concept. In this paper though our focus were particular geometric results estab- 
lished in ifTSl and how one can complement them. On the other hand, when viewed outside the scope of the 
results presented in ||T81 our methodology admits consideration of substantially more general concepts. This 
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goes way beyond the particular problems that we considered in this paper and we will present it elsewhere. 
Finally, it is quite likely that Gordon's original results that we revisited here were only a tool towards 
much higher mathematical goals. Among them would immediately be a better version of the Dvoretzky 
theorem already established in Gordon's original work. Our results can then be used to complement all of 
such results where Gordon's estimates turned out to be of use. Of course, revisiting all of these takes a 
substantial effort that goes way beyond what we planned to present here. Here we only focused at the heart 
of the idea, which essentially boils down to simple reuse (with a little bit of our own recognition that duality 
theory can be quite powerful) of the Gordon's mechanism to prove its own optimality. 
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